DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
نویسندگان
چکیده
We describe details of our runs and the results obtained for the “IR for Spoken Documents (SpokenDoc) Task” at NTCIR-9. The focus of our participation in this task was the investigation of the use of segmentation methods to divide the manual and ASR transcripts into topically coherent segments. The underlying assumption of this approach is that these segments will capture passages in the transcript relevant to the query. Our experiments investigate the use of two lexical coherence based segmentation algorithms (TextTiling, C99). These are run on the provided manual and ASR transcripts, and the ASR transcript with stop words removed. Evaluation of the results shows that TextTiling consistently performs better than C99 both in segmenting the data into retrieval units as evaluated using the centre located relevant information metric and in having higher content precision in each automatically created segment.
منابع مشابه
DCU at the NTCIR-11 SpokenQuery&Doc Task
We describe DCU’s participation in the NTCIR-11 SpokenQuery&Document task. We participated in the spokenquery spoken content retrieval (SQ-SCR) subtask by using the slide group segments as basic indexing and retrieval units. Our approach integrates normalised prosodic features into a standard BM25 weighting function to increase weights for terms that are prominent in speech. Text queries and re...
متن کاملSpoken Document Retrieval by Contents Complement and Keyword Expansion Using Subordinate Concept for NTCIR-SpokenDoc
We report on the result of investigating which relationship is important among hypernym and hyponym relationships in retrieval keyword expansion. Moreover, we report the effect of the keyword expansion and the contents complement for spoken document retrieval for SCR lecture retrieval task and SCR passage retrieval task. Spoken Document Retrieval by contents complement and keyword expansion usi...
متن کاملSpoken Document Retrieval Experiments for SpokenDoc at Ryukoku University (RYSDT)
In this paper, we describe spoken document retrieval systems in Ryukoku University, which were participated in NTCIR-9 IR for Spoken Documents (“SpokenDoc”) task. In NTCIR-9 “SpokenDoc” task, there are two subtasks: “Spoken term detection (STD) subtask” and “Spoken document retrieval (SDR) subtask”. We participated in the both subtasks as team RYSDT. In this paper, first, our STD systems are de...
متن کاملDTW-Distance-Ordered Spoken Term Detection and STD-based Spoken Content Retrieval: Experiments at NTCIR-10 SpokenDoc-2
In this paper, we report our experiments at NTCIR-10 SpokenDoc-2 task. We participated both the STD and SCR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance without using any predefined...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011